Eliminating NULLs with Subsumption and Complementation

نویسندگان

Jens Bleiholder

Melanie Herschel

Felix Naumann

چکیده

In a data integration process, an important step after schema matching and duplicate detection is data fusion. It is concerned with the combination or merging of different representations of one real-world object into a single, consistent representation. In order to solve potential data conflicts, many different conflict resolution strategies can be applied. In particular, some representations might contain missing values (NULL-values) where others provide a non-NULL-value. A common strategy to handle such NULL-values, is to replace them with the existing values from other representations. Thus, the conciseness of the representation is increased without losing information. Two examples for relational operators that implement such a strategy are minimum union and complement union and their unary building blocks subsumption and complementation. In this paper, we define and motivate the use of these operators in data integration, consider them as database primitives, and show how to perform optimization of query plans in presence of subsumption and complementation with rule-based plan transformations. 1 Data Fusion as Part of Data Integration Data integration can be seen as a three-step process consisting of schema matching, duplicate detection and data fusion. Schema matching is concerned with the resolution of schematic conflicts, for instance through schema matching and schema mapping techniques. Next, duplicate detection is concerned with resolving conflicts at object level, in particular detecting two (or more!) representations of same real-world objects, called duplicates. For instance, considering two data sources describing persons, schema matching determines that the concatenation of the attributes firstname and lastname in Source 1 is semantically equivalent to the attribute name in Source 2. Duplicate detection then recognizes that the entry John M. Smith in Source 1 represents the same person as the entry J. M. Smith in Source 2. This article focuses on the step that succeeds both schema matching and duplicate detection, namely data fusion. This final step combines different representations of the same real-world object (previously identified Copyright 2011 IEEE. Personal use of this material is permitted. However, permission to reprint/republish this material for advertising or promotional purposes or for creating new collective works for resale or redistribution to servers or lists, or to reuse any copyrighted component of this work in other works must be obtained from the IEEE. Bulletin of the IEEE Computer Society Technical Committee on Data Engineering

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Efficient Büchi Universality Checking

The complementation of Büchi automata, required for checking automata universality, remains one of the outstanding automata-theoretic challenges in formal verification. Early constructions using a Ramsey-based argument have been supplanted by rank-based constructions with exponentially better bounds. The best rank-based algorithm for Büchi universality, by Doyen and Raskin, employs a subsumptio...

متن کامل

Preprocessing Techniques for QBFs

In this paper we present sQueezeBF, an effective preprocessor for QBFs that combines various techniques for eliminating variables and/or clauses. In particular sQueezeBF combines (i) variable elimination by Q-resolution and equality reduction, and (ii) clause simplification via subsumption and self-subsumption resolution. The experimental analysis shows that sQueezeBF can produce significant re...

متن کامل

A codon-shuffling method to prevent reversion during production of replication-defective herpesvirus stocks: Implications for herpesvirus vaccines

Herpesviruses establish life-long chronic infections that place infected hosts at risk for severe disease. Herpesvirus genomes readily undergo homologous recombination (HR) during productive replication, often leading to wild-type (WT) reversion during complementation of replication-defective and attenuated viruses via HR with the helper gene provided in trans. To overcome this barrier, we deve...

متن کامل

Universal (and Existential) Nulls

Incomplete Information research is quite mature when it comes to so called existential nulls, where an existential null is a value stored in the database, representing an unknown object. For some reason universal nulls, that is, values representing all possible objects, have received almost no attention. We remedy the situation in this paper, by showing that a suitable finite representation mec...

متن کامل

Vandermonde-Lagrange Mutually Orthogonal Flexible Transceivers for Blind CDMA in Unknown Multipath

A Mutually-Orthogonal Usercode-Receiver (AMOUR) system was recently proposed to guarantee identifiability of transmitted symbols irrespective of channel nulls in addition to offering deterministic MUI elimination and low complexity transceivers. Motivated by the desire to equip the AMOUR with added flexibility, we develop in this paper a Vandermonde–Lagrange AMOUR system that achieves the same ...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

IEEE Data Eng. Bull.

دوره 34 شماره

صفحات -

تاریخ انتشار 2011

Eliminating NULLs with Subsumption and Complementation

نویسندگان

چکیده

منابع مشابه

Efficient Büchi Universality Checking

Preprocessing Techniques for QBFs

A codon-shuffling method to prevent reversion during production of replication-defective herpesvirus stocks: Implications for herpesvirus vaccines

Universal (and Existential) Nulls

Vandermonde-Lagrange Mutually Orthogonal Flexible Transceivers for Blind CDMA in Unknown Multipath

عنوان ژورنال:

اشتراک گذاری